Learning n-Ary Node Selecting Tree Transducers from Completely Annotated Examples
نویسندگان
چکیده
We present the first algorithm for learning n-ary node selection queries in trees from completely annotated examples by methods of grammatical inference. We propose to represent n-ary queries by deterministic n-ary node selecting tree transducers (n-NSTTs). These are tree automata that capture the class of monadic second-order definable nary queries. We show that n-NSTT defined polynomially bounded n-ary queries can be learned from polynomial time and data. An application in Web information extraction yields encouraging results.
منابع مشابه
Learning Node Selecting Tree Transducer from Completely Annotated Examples
A base problem in Web information extraction is to find appropriate queries for informative nodes in trees. We propose to learn queries for nodes in trees automatically from examples. We introduce node selecting tree transducer (NSTT) and show how to induce deterministic NSTTs in polynomial time from completely annotated examples. We have implemented learning algorithms for NSTTs, started apply...
متن کاملInteractive Learning of Node Selecting Tree Transducers⋆
We develop new algorithms for learning monadic node selection queries in unranked trees from annotated examples, and apply them to visually interactive Web information extraction. We propose to represent monadic queries by bottom-up deterministic Node Selecting Tree Transducers (Nstts), a particular class of tree au-tomata that we introduce. We prove that deterministic Nstts capture the class o...
متن کاملSchema-Guided Induction of Monadic Queries
The induction of monadic node selecting queries from partially annotated XML-trees is a key task in Web information extraction. We show how to integrate schema guidance into an RPNI-based learning algorithm, in which monadic queries are represented by pruning node selecting tree transducers. We present experimental results on schema guidance by the DTD of HTML.
متن کاملQuery induction with schema-guided pruning strategies
Inference algorithms for tree automata that define node selecting queries in unranked trees rely on tree pruning strategies. These impose additional assumptions on node selection that are needed to compensate for small numbers of annotated examples. Pruning-based heuristics in query learning algorithms forWeb information extraction often boost the learning quality and speed up the learning proc...
متن کاملLearning Monadic Queries for Semi-Structured Documents from Positive Examples
Querying for nodes in trees is a core operation for information extraction from semi-structured documents in XML or HTML. We show that regular monadic queries for nodes in trees can be identified from positive examples, and this in polynomial time when represented by deterministic node selecting transducers that we introduce.
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2006